Aminet 4

home *** CD-ROM | disk | FTP | other *** search

/ Aminet 4 / Aminet 4 - November 1994.iso / aminet / comm / uucp / wcnews_1_0_30.lha / man / dbz3z.man < prev next >

Wrap

Text File | 1994-01-04 | 16KB | 397 lines

DBZ(3Z) DBZ(3Z) 1mNAME22m dbminit, fetch, store, dbmclose - somewhat dbmcompatible database routines dbzfresh, dbzagain, dbzfetch, dbzstore - database routines dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug - database routines 1mSYNOPSIS22m 1m#include22m 1m<dbz.h>22m 1mdbminit(base)22m 1mchar22m 1m*base;22m 1mdatum22m 1mfetch(key)22m 1mdatum22m 1mkey;22m 1mstore(key,22m 1mvalue)22m 1mdatum22m 1mkey;22m 1mdatum22m 1mvalue;22m 1mdbmclose()22m 1mdbzfresh(base,22m 1msize,22m 1mfieldsep,22m 1mcmap,22m 1mtagmask)22m 1mchar22m 1m*base;22m 1mlong22m 1msize;22m 1mint22m 1mfieldsep;22m 1mint22m 1mcmap;22m 1mlong22m 1mtagmask;22m 1mdbzagain(base,22m 1moldbase)22m 1mchar22m 1m*base;22m 1mchar22m 1m*oldbase;22m 1mdatum22m 1mdbzfetch(key)22m 1mdatum22m 1mkey;22m 1mdbzstore(key,22m 1mvalue)22m 1mdatum22m 1mkey;22m 1mdatum22m 1mvalue;22m 1mdbzsync()22m 1mlong22m 1mdbzsize(nentries)22m 1mlong22m 1mnentries;22m 1mdbzincore(newvalue)22m 1mdbzcancel()22m 1mdbzdebug(newvalue)22m 3 Feb 1991 1 DBZ(3Z) DBZ(3Z) 1mDESCRIPTION22m These functions provide an indexing system for rapid ran dom access to a text file (the 4mbase24m 4mfile24m). Subject to certain constraints, they are callcompatible with 4mdbm24m(3), although they also provide some extensions. (Note that they are 4mnot24m filecompatible with 4mdbm24m or any variant thereof.) In principle, 4mdbz24m stores keyvalue pairs, where both key and value are arbitrary sequences of bytes, specified to the functions by values of type 4mdatum24m, typedefed in the header file to be a structure with members 4mdptr24m (a value of type 4mchar24m 4m*24m pointing to the bytes) and 4mdsize24m (a value of type 4mint24m indicating how long the byte sequence is). In practice, 4mdbz24m is more restricted than 4mdbm24m. A 4mdbz24m database must be an index into a base file, with the database 4mvalue24ms being 4mfseek24m(3) offsets into the base file. Each such 4mvalue24m must ``point to'' a place in the base file where the corresponding 4mkey24m sequence is found. A key can be no longer than DBZMAXKEY (a constant defined in the header file) bytes. No key can be an initial subsequence of another, which in most applications requires that keys be either bracketed or terminated in some way (see the discussion of the 4mfieldsep24m parameter of 4mdbzfresh24m, below, for a fine point on terminators). 4mDbminit24m opens a database, an index into the base file 4mbase24m, consisting of files 4mbase1m24m.dir22m and 4mbase1m24m.pag22m which must already exist. (If the database is new, they should be zerolength files.) Subsequent accesses go to that database until 4mdbmclose24m is called to close the database. The base file need not exist at the time of the 4mdbminit24m, but it must exist before accesses are attempted. 4mFetch24m searches the database for the specified 4mkey24m, return ing the corresponding 4mvalue24m if any. 4mStore24m stores the 4mkey24m 4mvalue24m pair in the database. 4mStore24m will fail unless the database files are writeable. See below for a complica tion arising from case mapping. 4mDbzfresh24m is a variant of 4mdbminit24m for creating a new database with more control over details. Unlike for 4mdbminit24m, the database files need not exist: they will be created if necessary, and truncated in any case. 4mDbzfresh24m's 4msize24m parameter specifies the size of the first hash table within the database, in keyvalue pairs. Per formance will be best if 4msize24m is a prime number and the number of keyvalue pairs stored in the database does not exceed about 2/3 of 4msize24m. (The 4mdbzsize24m function, given the expected number of keyvalue pairs, will suggest a database size that meets these criteria.) Assuming that an 4mfseek24m offset is 4 bytes, the 1m.pag22m file will be 4*4msize24m 3 Feb 1991 2 DBZ(3Z) DBZ(3Z) bytes (the 1m.dir22m file is tiny and roughly constant in size) until the number of keyvalue pairs exceeds about 80% of 4msize24m. (Nothing awful will happen if the database grows beyond 100% of 4msize24m, but accesses will slow down somewhat and the 1m.pag22m file will grow somewhat.) 4mDbzfresh24m's 4mfieldsep24m parameter specifies the field separa tor in the base file. If this is not NUL (0), and the last character of a 4mkey24m argument is NUL, that NUL compares equal to either a NUL or a 4mfieldsep24m in the base file. This permits use of NUL to terminate key strings without requiring that NULs appear in the base file. The 4mfieldsep24m of a database created with 4mdbminit24m is the horizontaltab character. For use in news systems, various forms of case mapping (e.g. uppercase to lowercase) in keys are available. The 4mcmap24m parameter to 4mdbzfresh24m is a single character specify ing which of several mapping algorithms to use. Available algorithms are: 1m022m casesensitive: no case mapping 1mB22m same as 1m022m 1mNUL22m same as 1m022m 1m=22m caseinsensitive: uppercase and lowercase equivalent 1mb22m same as 1m=22m 1mC22m RFC822 messageID rules, casesensitive before `@' (with certain exceptions) and caseinsensitive after 1m?22m whatever the local default is, normally 1mC22m Mapping algorithm 1m022m (no mapping) is faster than the others and is overwhelmingly the correct choice for most applica tions. Unless compatibility constraints interfere, it is more efficient to premap the keys, storing mapped keys in the base file, than to have 4mdbz24m do the mapping on every search. For historical reasons, 4mfetch24m and 4mstore24m expect their 4mkey24m arguments to be premapped, but expect unmapped keys in the base file. 4mDbzfetch24m and 4mdbzstore24m do the same jobs but handle all case mapping internally, so the customer need not worry about it. 4mDbz24m stores only the database 4mvalue24ms in its files, relying on reference to the base file to confirm a hit on a key. References to the base file can be minimized, greatly 3 Feb 1991 3 DBZ(3Z) DBZ(3Z) speeding up searches, if a little bit of information about the keys can be stored in the 4mdbz24m files. This is ``free'' if there are some unused bits in an 4mfseek24m offset, so that the offset can be 4mtagged24m with some information about the key. The 4mtagmask24m parameter of 4mdbzfresh24m allows specifying the location of unused bits. 4mTagmask24m should be a mask with one group of contiguous 1m122m bits. The bits in the mask should be unused (0) in 4mmost24m offsets. The bit immediately above the mask (the 4mflag24m bit) should be unused (0) in 4mall24m offsets; 4m(dbz)store24m will reject attempts to store a key value pair in which the 4mvalue24m has the flag bit on. Apart from this restriction, tagging is invisible to the user. As a special case, a 4mtagmask24m of 1 means ``no tagging'', for use with enormous base files or on systems with unusual offset representations. A 4msize24m of 0 given to 4mdbzfresh24m is synonymous with the local default; the normal default is suitable for tables of 90100,000 keyvalue pairs. A 4mcmap24m of 0 (NUL) is synony mous with the character 1m022m, signifying no case mapping (note that the character 1m?22m specifies the local default mapping, normally 1mC22m). A 4mtagmask24m of 0 is synonymous with the local default tag mask, normally 0x7f000000 (specify ing the top bit in a 32bit offset as the flag bit, and the next 7 bits as the mask, which is suitable for base files up to circa 24MB). Calling 4mdbminit(name)24m with the database files empty is equivalent to calling 4mdbzfresh(name,0,'\t','?',0)24m. When databases are regenerated periodically, as in news, it is simplest to pick the parameters for a new database based on the old one. This also permits some memory of past sizes of the old database, so that a new database size can be chosen to cover expected fluctuations. 4mDbza24m 4mgain24m is a variant of 4mdbminit24m for creating a new database as a new generation of an old database. The database files for 4moldbase24m must exist. 4mDbzagain24m is equivalent to calling 4mdbzfresh24m with the same field separator, case map ping, and tag mask as the old database, and a 4msize24m equal to the result of applying 4mdbzsize24m to the largest number of entries in the 4moldbase24m database and its previous 10 gener ations. When many accesses are being done by the same program, 4mdbz24m is massively faster if its first hash table is in memory. If an internal flag is 1, an attempt is made to read the table in when the database is opened, and 4mdbmclose24m writes it out to disk again (if it was read successfully and has been modified). 4mDbzincore24m sets the flag to 4mnewvalue24m (which should be 0 or 1) and returns the previous value; this does not affect the status of a database that has already been opened. The default is 0. The attempt to read the table in may fail due to memory shortage; in this case 4mdbz24m quietly falls back on its default behavior. 3 Feb 1991 4 DBZ(3Z) DBZ(3Z) 4mStore24ms to an inmemory database are not (in general) writ ten out to the file until 4mdbmclose24m or 4mdbzsync24m, so if robustness in the presence of crashes or concurrent accesses is crucial, inmemory databases should probably be avoided. 4mDbzsync24m causes all buffers etc. to be flushed out to the files. It is typically used as a precaution against crashes or concurrent accesses when a 4mdbz24musing process will be running for a long time. It is a somewhat expen sive operation, especially for an inmemory database. 4mDbzcancel24m cancels any pending writes from buffers. This is typically useful only for incore databases, since writes are otherwise done immediately. Its main purpose is to let a child process, in the wake of a 4mfork24m, do a 4mdbmclose24m without writing its parent's data to disk. If 4mdbz24m has been compiled with debugging facilities avail able (which makes it bigger and a bit slower), 4mdbzdebug24m alters the value (and returns the previous value) of an internal flag which (when 1; default is 0) causes verbose and cryptic debugging output on standard output. Concurrent reading of databases is fairly safe, but there is no (inter)locking, so concurrent updating is not. The database files include a record of the byte order of the processor creating the database, and accesses by pro cessors with different byte order will work, although they will be slightly slower. Byte order is preserved by 4mdbza24m 4mgain24m. However, agreement on the size and internal struc ture of an 4mfseek24m offset is necessary, as is consensus on the character set. An open database occupies three 4mstdio24m streams and their corresponding file descriptors; a fourth is needed for an inmemory database. Memory consumption is negligible (except for 4mstdio24m buffers) except for inmemory databases. 1mSEE22m 1mALSO22m dbz(1), dbm(3) 1mDIAGNOSTICS22m Functions returning 4mint24m values return 0 for success, -1 for failure. Functions returning 4mdatum24m values return a value with 4mdptr24m set to NULL for failure. 4mDbminit24m attempts to have 4merrno24m set plausibly on return, but otherwise this is not guaranteed. An 4merrno24m of 1mEDOM22m from 4mdbminit24m indi cates that the database did not appear to be in 4mdbz24m for mat. 1mHISTORY22m The original 4mdbz24m was written by Jon Zeeff (zeeff@b 3 Feb 1991 5 DBZ(3Z) DBZ(3Z) tech.annarbor.mi.us). Later contributions by David But ler and Mark Moraes. Extensive reworking, including this documentation, by Henry Spencer (henry@zoo.toronto.edu) as part of the C News project. Hashing function by Peter Honeyman. 1mBUGS22m The 4mdptr24m members of returned 4mdatum24m values point to static storage which is overwritten by later calls. Unlike 4mdbm24m, 4mdbz24m will misbehave if an existing keyvalue pair is `overwritten' by a new 4m(dbz)store24m with the same key. The user is responsible for avoiding this by using 4m(dbz)fetch24m first to check for duplicates; an internal optimization remembers the result of the first search so there is minimal overhead in this. Waiting until after 4mdbminit24m to bring the base file into existence will fail if 4mchdir24m(2) has been used meanwhile. The RFC822 case mapper implements only a first approxima tion to the hideouslycomplex RFC822 case rules. The prime finder in 4mdbzsize24m is not particularly quick. Should implement the 4mdbm24m functions 4mdelete24m, 4mfirstkey24m, and 4mnextkey24m. On C implementations which trap integer overflow, 4mdbz24m will refuse to 4m(dbz)store24m an 4mfseek24m offset equal to the greatest representable positive number, as this would cause over flow in the biased representation used. 4mDbzagain24m perhaps ought to notice when many offsets in the old database were too big for tagging, and shrink the tag mask to match. Marking 4mdbz24m's file descriptors closeon4mexec24m would be a better approach to the problem 4mdbzcancel24m tries to address, but that's harder to do portably. 3 Feb 1991 6